Skip to content

Conversation

@SamRemis
Copy link
Contributor

This pull request centralizes and clarifies timeout handling across the async client and transport layers by adding a transport-level error classification API and surfacing transport-detected timeouts as a dedicated ClientTimeoutError. Transports now return an ErrorInfo that indicates whether an exception represents a timeout and whether the fault is client- or server-side, and the core async client consults that information and raises ClientTimeoutError when appropriate so callers see a single, consistent exception for client-side timeouts.

ClientTransport implementations must now implement get_error_info(exception, **kwargs) and return an ErrorInfo indicating whether the exception is a timeout and whether the fault is client- or server-side. The break was required so the core async client can reliably classify transport errors and raise a single ClientTimeoutError for client-side timeouts. This information is necessary for handling errors as a part of retries.

@SamRemis SamRemis requested a review from a team as a code owner October 28, 2025 17:05
Copy link
Contributor

@jonathan343 jonathan343 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to take a deeper look, but I don't think we're catching the new exception somewhere. I hacked around the CRT client to set the connect_timeout_ms to 10. The result is the SDK will keep trying to send the request indefinitely. When debugging I see the response is:

Client timeout occurred: AWS_IO_SOCKET_TIMEOUT: socket operation timed out.

Update: This might be specific to the simple retry mode not being able to handle this. I need a bit more investigation time to get a more clean answer though

exceptions represent timeout conditions for that transport.
"""

def get_error_info(self, exception: Exception, **kwargs: Any) -> ClientErrorInfo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit hesitant to make this a required piece of ClientTransport. This is breaking for existing versions of smithy_http since clients don't implement this. We need to do one of the following:

  • Make this optional and handle it gracefully
  • Include a breaking changelog entry so we know to version bump properly in the next release

I think I prefer the first option

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I understand this conceptually. Transport is specifically an abstraction that takes a Request, "sends" it to an external entity or process, and then returns whatever response we get.

How is ErrorClassifyingTransport doing that? This seems like it's a post-processing step on a received response.

Copy link
Contributor Author

@SamRemis SamRemis Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale behind the current implementation is that each client has its own unique set of exceptions that will be raised in the case of a timeout. These need to be caught and flagged as timeout exceptions so that our retry handlers can adjust the retry behavior appropriately.

Since the errors are specific to implementations of the HTTP client, the exception handling logic needs to live on the client itself or find some other way of associating its related error info. The get_error_info method will allow us to extend this in the future to add more information than just if a given exception is a timeout, but a simpler alternative would be adding an is_timeout_error(exception) method directly to the ClientTransport protocol.

Unfortunately, protocols in python don't offer a way of marking a method as optional. We'd either need a default implementation that returns False, or use runtime checks like the current hasattr(client, 'get_error_info').

I'm open to refactoring the approach if there's a preference between the two suggestions above or alternative idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale behind the current implementation is that each client has its own unique set of exceptions that will be raised in the case of a timeout. These need to be caught and flagged as timeout exceptions so that our retry handlers can adjust the retry behavior appropriately.

I think we can accomplish this a few ways without needing to add this new concept. I think the original proposal for this was adding TIMEOUT_ERRORS and potentially other FOO_ERRORS constants onto each transport and making those part of the Protocol. Then we have a static location to look for this information across all transports and can write a single generalized function for error handling. Did we encounter issues with that approach?

Another option is to have the client's send method do that classification for us with its own error handling that ensures the subset of errors that should be classified as Timeouts are raised with an error that subclasses a general TimeoutError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIMEOUT_ERRORS doesn't allow any sort of introspection about the contents of the error; it only would have information about the class. For something like the CRT where all errors are instances of AwsCrtError but have varying name properties, this would have no way of determining if it was actually a TimeoutError or not.

We could ask the CRT to start raising more helpful errors, but allowing a ClientTransport to do further introspection makes sense to me from an extensibility standpoint.

Considering this, I'll go with the second option and add the logic to the client's send method if no one has any objections. Thank you for the suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants